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Abstract 

Despite the structural properties of online social networks have attracted much attention, the properties 
of the close-knit friendship structures remain an important question. Here, we mainly focus on how 
these mesoscale structures are affected by the local and global structural properties. Analyzing the 
data of four large-scale online social networks reveals several common structural properties. It is found 
that not only the local structures given by the indegree, outdegree, and reciprocal degree distributions 
follow a similar scaling behavior, the mesoscale structures represented by the distributions of close- 
knit friendship structures also exhibit a similar scaling law. The degree correlation is very weak over 
a wide range of the degrees. We propose a simple directed network model that captures the observed 
properties. The model incorporates two mechanisms: reciprocation and preferential attachment. Through 
rate equation analysis of our model, the local-scale and mesoscale structural properties are derived. In 
the local-scale, the same scaling behavior of indegree and outdegree distributions stems from indegree 
and outdegree of nodes both growing as the same function of the introduction time, and the reciprocal 
degree distribution also shows the same power-law due to the linear relationship between the reciprocal 
degree and in/outdegree of nodes. In the mesoscale, the distributions of four closed triples representing 
close-knit friendship structures are found to exhibit identical power-laws, a behavior attributed to the 
negligible degree correlations. Intriguingly, all the power-law exponents of the distributions in the local- 
scale and mesoscale depend only on one global parameter - the mean in/outdegree, while both the mean 
in/outdegree and the reciprocity together determine the ratio of the reciprocal degree of a node to its 
in/outdegree. Structural properties of numerical simulated networks are analyzed and compared with 
each of the four real networks. This work helps understand the interplay between structures on different 
scales in online social networks. 

Introduction 

In recent years, an increasing number of online social systems {e.g., YouTube and Facebook) have been 
attracting wide attention from different fields [TH3]. Online social networks provide a platform for web 
surfers to make acquaintance with congenial friends 0], exchange photos and personal news [5], share 
videos 6j, establish communities or forums on focused issues j^, etc. These online interactive behaviors, 
which partly reflect real-life social relationships among people, provide an unprecedented opportunity to 
study and understand the dazzling characteristics of real- life social systems [S'JW. 

Complex network theory has been proven to be a powerful framework to understand the structure and 
dynamics of complex systems [IOHl6] . Online social systems have been treated as undirected networks [TTl 
118] . which have been applied successfully in exploring various systems [lOj . This simplification, however 
cannot describe the asymmetric interactions among users. Taking Flickr as an example, if a user A 
designates another user B as a friend, user A can see the photos of user B, but not the other way round 
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unless user B also designates user A as his friend. Technically, an asymmetric interaction represents one 
directed link, and many online social systems are thus directed networks in nature. The directionality of 
links is important in characterizing the functioning of many systems, e.g., leadership structure of social 
reputation [inilin], reciprocal behavior in evolutionary games [H], information hierarchy of the World 
Wide Web [22j[23], citation relationship of scientific publications [24l[25], etc. Much effort has been 
devoted to understanding the structural properties of these directed networks, including the indegree 
and outdegree distributions [26], average shortest distance [26], degree correlation [27], and community 
structure |28fl30j . Correspondingly, there are many models proposed for the underlying mechanisms 
of the statistical properties. Dorogovtsev et al. [JT] generalized the Barabasi- Albert (B A) model [32] 
and obtained the exact form of the indegree distribution of growing networks in the thermodynamic 
limit. Krapivsky et al. [33] introduced a directed network model that generates correlated indegree and 
outdegree distributions. Zhou et al. argued that the "good get richer" mechanism would facilitate 
the emergence of scale-free leadership structure in online social networks. 

Up to now, most of the work on complex networks can be classified into studies on three scales: the 
local scale based on the single node properties (through statistical distributions), the macro-scale based 
on the global properties of networks (with global parameters), and the mesoscale based on properties 
due to a group of nodes (via modular properties) j34H36j . However, a majority of studies focused on the 
first two scales. In view of the significant role of modularity in the functionality of real networks, it has 
become increasing important to study the mesoscale structures. Communities and motifs are two key 
mesoscale structures of real complex networks. Community structures at mesoscale level are ubiquitous 
in a variety of real complex systems ;37,'38|. such as Facebook, YouTuhe, and Xiaonei. There are more 
connections among members of the same community than among members in different communities. 
Lancichinetti et al. analyzed the statistical properties of communities in five categories of real complex 
networks, and found that communities detected in networks of the same category display similar structural 
characteristics [3^. Motifs, which are defined as subgraphs that occur much more often than expected 
in a random network, play a significant role in our understanding of the interplay between the structures 
and dynamics of real complex networks |40fl45j . 

In spite of the structural features revealed at the three scales, understanding the interplay between the 
different scales has remained a major challenge [34H36j . In the present work, we study how the close-knit 
friendship structures of online social networks at the mesoscale level and the structural properties at the 
two other scales are affecting each other. In social networks, the close-knit friendship structure describes 
the closest unit, which is usually represented by the closed triples. In a directed network, there are 13 
different possible three- node subgraphs [41]. For situations without reciprocal links, a focal node has 
three possible unclosed triples. Each unclosed triple can be closed by adding a directed link between the 
two unconnected nodes, giving rise to four types of closed triples as shown in Figure fTl [44ll45] . The four 
closed triples fall into two groups: one is a feedback (FB) loop and the three others are feedforward {i.e., 
FFa, FFb, and FFc) loops. Structurally, the roles of three nodes in the FB loop are equivalent, but it 
is not the case in the FF loops. Any FFa loop (from the perspective of the focal node) becomes a FFi, 
loop for another node and a FFc loop for the third node, and thus the numbers of three feedforward 
loops are equal in directed networks. Compared to the unclosed triples, the closed triples play a more 
important role in dynamical processes on online social networks [46ll47j . such as opinion formation |48[ . 
game dynamics [49l, and cooperation evolution [50] . 

In online social networks, the closed triples are a good indicator of close-knit friendships among people. 
To understand the mesoscale structural properties of online social networks, we analyze data of popular 
online social networks, establish the empirical facts, and introduce a directed network model. We analyze 
four large-scale online social networks, namely Epinions, Slashdot, Flickr, and Youtuhe, and establish that 
the distributions in each scale follow a similar power law. We propose a simple directed network model 
incorporating two processes: external reciprocation and internal evolution. Theoretical analysis shows 
that the distributions of four closed triples display almost identical scaling laws due to the negligible 
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degree correlations, and the distribution exponents depend only on one global parameter - the mean 
in/outdegree. Simulation results based on the model are basically consistent with both the empirical 
results and theoretical analysis. 



Results 

Empirical Results 

We first analyze four representative directed online social networks and establish the empirical features. 



As listed in Table[Tl these four datasets are: (i) Epinions Social Network (ESN, l|http://snap. stanford.edu/data/soc-Epinion 
a who-trust-whom online social network of a general consumer review site Epinions. com in which members 
can decide whether to "trust" each other or not, and subsequently all the trusted relationships form a so- 
called social trust network, (ii) Slashdot Social Network (SSN,|http://snap. Stanford. edu/data/soc-Slashdot0902.html| [51] : 
a friendship network of a technology- related news website Slashdot. com. Nodes are the users and links rep- 
resent the friendships among the users, (iii) Flickr Social Network (FSN, http: / /socialnetworks. mpi-sws.org/data-imc2007. 
a friendship network of a photo-sharing site Flickr. com that allows users to designate others as "contacts" 
or "friends" and track their activities in real time. This network contains all the friendship links among the 
users of Flickr. (iv) YouTube Social Netowrk (YSN, http:/ /socialnetworks. mpi-sws.org/data-imc2007. html) [52] : 
a friendship network of a popular video-sharing website YouTube.com on which users can upload, share 
and view videos. The nodes in the network are the users of YouTube, and a directed link is established 
from a user A to a user B when user A declares user B as a friend. Table [T] summarizes the basic global 
features of the four online social networks. These networks all show a large reciprocity r, defined by 
r = Er/{E — Er) [53] with Er and E being the numbers of reciprocal links and single directed links, 
respectively. Note that a reciprocal link contributes two single directed links. For example, r w 0.25 for 
ESN, r w 0.73 for SSN, r « 0.45 for FSN, and r ft! 0.65 for YSN. 

We also studied the local-scale structural properties of these social networks via statistical distribu- 
tions. The results of ESN are presented as an example. Figure [2] shows the indegree and outdegree 
distributions (black squares) on a log-log plot. The data span more than two decades. The distributions 
follow a power law with approximately the same exponent, i.e., P(kin) ~ k~^^"^ and P{kout) ~ k~Jf"\ 
with Jin ~ 1.73 and jout ~ 1-71 obtained by the maximum likelihood estimation |54 y 55 ] . More details 
about the power-law fits are given in Table SI of Appendix SI. Figure [3] shows that the indegree kin of 
each node is nearly proportional to its outdegree kout (also see Figures S4-S6 of Appendix SI), which is 
consistent with the similar scaling law of their distributions. In growing networks, the fat-tail power-law 
behavior in the degree distribution suggests that directed links are not drawn toward and from exist- 
ing users uniformly. Mislove et al. showed that there is a positive correlation between the number of 
links a user has and its probability of creating or receiving new links in online social networks [5]. This 
phenomenon is called "preferential attachment" [5J|331|33J[S5]. The behavior kin ~ kout for any node 
implies that a node with large ki„ has a strong ability to attract links from other nodes and also a strong 
tendency to link to other nodes. This is reminiscences of the product fco„(fc^„ used in the prediction of a 
link between the nodes i and j [57], i.e., a larger product gives a larger probability of having a directed 
link from i to j. These results lead us to incorporate a preferential attachment mechanism related to 
koutkin i^to the mechanism of how the links grow in a network. 

The reciprocal degree is the number of reciprocal links that a node possesses. Figure |4| shows that 
the reciprocal degree distribution also follows a power law P{kr) ~ fc^'' with an exponent jr ~ 1-69 as 
examined by the maximum likelihood estimation [541155] , similar to that of the indegree and outdegree 
distributions. Figure [5] shows that the mean reciprocal degree of the nodes with the same indegree 
{kr{kin)) is approximately linearly proportional to the indegree kin (also see Figures S10-S12 of Appendix 
SI), i.e., {kr{kin)) ~ kim and in a similar fashion {kr{kout)) ^ kout, implying that the probability that a 
randomly chosen directed link happens to be a reciprocal link is roughly a constant. All these features 
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are consistent with the observation that the indegree, outdegree, and reciprocal degree distributions all 
follow a similar exponent. 

For mesoscale structures, we focus on the four closed triples i.e., FB, FFa, FFf, and FFc. As the 
numbers of three feedforward loops are equal, i.e., Npp^ = Npp^^ = Npp^, we only look at the total 
numbers of FB and FFa closed triples. For ESN, NpB = 740,310 and Npp^ = 3,586,403 as shown in 
Table [T] Considering the feedforward loops as the same up to the permutation of the focal node, it is 
interesting to see that Nps '. Npp^ « 1 : 5. This implies the existence of some underlying mechanism. 
Since the indegree and outdegree distributions are heterogeneous, we study the numbers of the four 
closed triples (i.e., nps, npp^, npp^, and npp^) at different nodes and their distributions. Figure [5] shows 
that, although the numbers of feedback and feedforward loops are different, their distributions follow 
similar scaling laws, i.e., P^nps) ~ npB~'^'''^ and P(npp) npp~^'^'" , with ^pb ~ 1.37, ^pp^ ~ 1.39, 
^pp^ « 1.35 and ^pp^ ~ 1.38 as determined by the maximum likelihood estimation |54[|55| . More 
details on the exponents are given in Table SI of Appendix SI. Moreover, although the numbers of three 
feedforward loops are equal, their distributions look slightly different in detail. This is a phenomenon 
worthy of further research. 

To understand this phenomenon, we consider the three unclosed triples in Figure [TJ For a node with 
indeg ree k^ji and outdegree fcoui; there are unclosed triples C^. Cj. unclosed triples B, and C^. 
unclosed triples C when reciprocal links are forbidden, where C™ = n!/[m!(n — m)!] denotes the binomial 
coefficient. These unclosed triples would generate closed triples in the ratio n'pg : n'pp^ — (C^. C^^^y2) : 
{Cl + Cl. + C^. Cl /2). Accounting for all the nodes, we can obtain the total number of optional 
closid triples iV^^ = Sf (Ci.^Ci.^y2) and N'^p^ = Eti(^^fe.„, + + ^kSkJ"^^^ respectively. 
Assuming there is no degree correlation and making USG of kin ~ koutj we have NpB ■ Npp « 1 : 5, 
which is basically consistent with the ratio found in ESN. The assumption of no degree distribution is 
supported by the results in Figure [7l[a) , in which the network shows a very weak degree correlation over 
two decades that can be treated almost as no degree correlation (further quantitative evidence is given 
by the Pearson correlation coefficient in Table S2 of Appendix SI) [58]. In this case, the number of closed 
triples at a node depends only on its indegree km and outdegree kout, i-e-, nps^kf^ and npp^ ^^r 
large kin, npB ^k^^^ and npp^ ^^'^ut for large fco«t- This behavior is confirmed in Figure |8] and Figure |9] 
(also see Figures S19-S24 of Appendix SI). This also gives the reason why the distributions of four closed 
triples follow similar scaling laws. Results of analyzing the other three networks (i.e., Slashdot, Flickr 
and YouTuhe) also exhibit similar phenomena (see Figures S1-S24 of Appendix SI). 

Directed Network Model 

We propose a growing network model with node and link creation processes incorporating link direc- 
tionality that reproduces the empirical features. In the model, we consider two evolutionary ingredients: 
reciprocation and preferential attachment. On one hand, many empirical results show that the reciprocity 
r of online social networks is much greater than in sparse random directed networks with r — >■ [51153) . 
Our results of r « 0.45 of FSN and r 0.65 of YSN provide further evidence. The high reciprocity 
implies that there is a good chance that the creation of a directed link prompts the establishment of a 
reversed link. For example, users of Flickr often respond to an incoming link by quickly establishing a 
reversed link as a matter of courtesy [5]. Thus, reciprocation is believed to be an independent growth 
mechanism in large-scale online social networks. On the other hand, preferential attachment has been 
proven to be an important and basic growing mechanism in online social networks [51 l321l551l56| . Users 
with large indegrees and outdegrees are more likely to receive incoming links and create outgoing links, 
respectively. This motivated us to incorporate a preferential attachment mechanism depending on the 
product kl^^kf^ in creating new links. 

The model starts with an initial seed consisting of jtlq nodes. At each time step, a new node is added 
and 2 + TO -|- mp new directed links are introduced according to two processes: external reciprocation and 
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internal evolution. 

(1) External reciprocation. The new node in every time step establishes a new directed link with an 
existing nodes i in the network with a probability 

proportional to the indegree fcj„ of node i. To incorporate the reciprocation mechanism, the node i 
that receives the link creates a reversed link to the new node. Consequently, a reciprocal link is created 
between these two nodes. This mechanism is reasonable in that a strong motivation of a new user joining 
a social network is to get connected to and interact with someone already in the network. As we shall 
see, this process can be treated conveniently in the mathematical analysis of the model. 

(2) Internal evolution. In each time step, m new directed links, representing the activity of the 
network, are created among the existing nodes according to the preferential attachment mechanism. 
Consider two unconnected nodes i and j up to that time step, a new directed link from node i to node j 
is created with the probability 

1.1 u 

'^out'^in /r,^ 

Z-ix,y,x<fTin{y) "'out"'in 

where fc^^^ and kj^ are the outdegree of node i and the indegree of the target node j, respectively, and 
^iniu) in the normalization factor is the set of incoming neighbors of node y at that time step. This 
attachment probability is proportional to the product kl^^kj^. The larger the product is, the greater 
probability a new directed link is created between them. For each of the new directed links created, a 
reversed link will be established with the reciprocation probability p. Therefore, m + mp directed links 
are introduced into the network through internal evolution in each time step. It should be noted that 
multiple links between two nodes and self-connections are prohibited in the model. 



Materials and Methods 
Rate Equation Analysis 

We first analyze the indegree and outdegree distributions of the model. After t steps, the growing 
directed network has N — mQ+t nodes and {2+m+'mp)t directed links, where the tiny number of initial 
links in the seed are ignored. Meanwhile, the sum of indegree and the sum of outdegree are equal, i.e., 
Kn ^out — (2 + m + mp)t. For a sparse network with mean indegree (fc) = 2-|-m + mp<< A^, we 

have Y.x.y.,x<tvUy) KutKi ~ E Kut X E kfj, = [{2 + m + mp)t]'^ so that Eq. © can be approximated by 

~ [{2 + m + mp)tY- ^ ^ 

Consider the creation of one new directed link via the internal evolution at step t. The probability 
that the indegree fc*„ of node i increases by one due to the creation of one link is 

. . p hi . . hi hi 

^ \(2 + m + mp)tY ^ ^ \(2 + m + mp)tY' ^' 



where the first term gives the probability that the node i receives a new incoming link from one of the 
other nodes and the second term gives the probability that a reversed link is created back to node i when a 
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new directed link was created from node i to some node j. According to J^j^r- (i) ^out — X^j^r t(j) ^in ^ 
(2 + m + mp)t, is approximately given by 

Kn (2 + m + mp)i 

Similarly, the probability p^^ that the outdegree fc^^j^ of node i increases by one due to the creation of 
one link is 



^ {2 + m + ■mp)t' ^ ' 



Equations for the rate of change of the expected indegree and outdegree kl,^^ can then be written 
down. Taking and fc^^j^ as continuous variables, the dynamical equations are 



dkiutit) _ Kn 



dt y . fcj' 



where the first term in the equations comes from the newly added node in a time step. The difference of 
the two equations gives 

d[k\n(t) - K^Ai)] _ ^ + + _ rn{^ ~ P){Kn - Kut) (n. 

- - mp^^^ rnp,^^^ - (3 + m + mp)t ' 

where Eqs. ([5]) and ^ have been used. Let ti be the time that the node i is introduced, i. e., klj^{ti) = 
Kuti^i) = 1- It follows from Eq. ^ that = kl^f{t) at any time t. Although the expected value 

of the difference between indegrees and outdegrees of a node does not grow over time mathematically, 
the difference does exist in a particular realization of the model in simulations. Eq. ([T]) and the initial 
condition klJU) = kl^^iU) ^ I gives 

fcLW = fcL.W = (f )^ (9) 

where /3 = (1 + m + mp)/(2 + m + mp). The indegree and outdegree of the nodes both grow over time 
in the same functional form, with older nodes having higher indegrees and outdegrees. 

Let Nki„ (t) and Nk„^^ (t) be the number of nodes with expected indegree kin and outdegree kout at 
the time step i, respectively. The rate equation of iV^.^ (t) is then given by 

dNki„{t) kin — 1 kin AT , + AT + AT 1 JT / T n\ 

= - Nk,„^i - - Nk,„ + mp+^_^iVfc„,_i - mpl^^Nk^^ + dk^^.i. (10) 

in in 

The first and third terms on the right-hand side account for the increase of Nk^^ (t) due to the external 
reciprocation and internal evolution, respectively; and the second and fourth terms account for the 
decrease due to the processes. The last term accounts for the introduction of a new node with indegree 
kin = 1 at time t. Eq. ([TU| is valid for all kin > 1- 

After many steps t, there are N = toq + t ~ t nodes in the network. In the asymptotic limit, we 
substitute Nki„{t) — tP{kin), where P(fci„) is the indegree distribution [SI], and y^-kl^ = {2 + m + mp)t 
into Eq. (ITUl) to obtain the simple recursive relation 

[2 + m + mp + (1 + rn + mp)fc„]P(fc„) = (1 + m + mp){kin - l)P{kin - 1) + (2 + m + mp)4.„,i- (H) 
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Using the initial condition that kin = 1 at the time that a node was introduced, the solution of Eq. ljlll) 
is 

where A = Ff-n — h 3) and F is the Euler gamma function. Using the asymptotic form 

r(a:: + A) as x — > oo, we can extract the scaling form 

Pihn) ^ Ak;^'^^^^\ (13) 
Similarly, the rate equation of N^^^^ (t) is given by 

The first (second) and third (fourth) terms on the right-hand side account for the increase (decrease) in 
-^feout due to the external reciprocation and internal evolution, respectively; and the last term accounts 
for the introduction of a new node with fco„f = 1 at time t. Substituting Nk^^^{t) — tP{kout), where 
P{kout) is the outdegree distribution, and kj^ = (2 + m + mp)t into Eq. (1141) . the recursive relation 
for P{kout) is 

[2 + TO + mp+(l + m + mp)fco„f + l]P(fco„i) = (l + r7i + TOp)(fco„t - l)P(fcout - 1) + (2 + m + mp)5fc„„ti' (15) 
which is identical to Eq. ([TT]) for Pikin). It follows that 



p{k,ut)^Ak,!:^^'^-^--\ (16) 

The results show that the expected indegree and outdegree grow over time following the same functional 
form of Eq. (|9]), and the indegree and outdegree distributions follow the same scaling law with an exponent 

7 = 2 + - i . (17) 

1 + m + nip 

Next, we consider the reciprocal degree distribution P{kr). For a node i with = fc^^^, k^ satisfies 
the dynamical equation 

dklit) _ ki,^(t) ^ mpkl^jt) + mpkl^^jt) ^^^^ 
dt (2 + 771 + mp)t (2 + TO + mp)t 

Substituting Eq. dH) into Eq. (HH) and using the initial condition that kl,{ti) = 1 at the time that node i 
was introduced, the solution to Eq. ([T8| is 



m = Y^^^^^^Huit) - 1] + 1. (19) 

1 + m + mp 

For large /c*„, we have 

K - j^P^kin. (20) 

1 + m + mp 

Using P{kr)dkr = P{ki„)dkin, the distribution P{kr) follows 

Pikr) ~ kr"', (21) 
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where 7 is given by Eq. ([T7)) as for the indegree and outdegree distributions. 

Furthermore, we analyze the degree correlations between connected nodes by the rate equation ap- 
proach. Let iV^°"' be the number of links that originate from a node with an expected outdegree lout 

to a node with an expected indegree hn [60]. Generally, pI°^^ is defined for km > 1 and lout > 2. The 
quantity nI""* (t) evolves according to 

dKz'jt) ^ [hn - mizu " fc»»<°r ^ (Ut-iXr-^-utivtr _^ {lout-mi^^t-i ^^ ^ 

/ N (^out ~ l)(fcm ~ 1) AT AT / I \ ^outkin at at 

+ (™ + ^P)^ 7:^-pr^'o„*-i^fc„.-i -{^ + mp)^ T^-^Nio.uNk,„. 

/-/x.y.x^Tiniy) "'out'^in /-ix.y.x0:i„(y) "'out'^in 

(22) 

where the first two terms on the right-hand side account for the changes due to the introduction of a new 
node, including the gains when the new node is connected to a node with indegree (fci„ — 1) (outdegree 
{lout — 1)) which is already connected to a node with outdegree lout (indegree kin), and the losses when 
the new node is connected to either end of a link that connects a node with outdegree lout and another 
node with indegree km- The third term accounts for the gain in jV'""' due to the addition of the new 
node. The remaining terms take into account the changes due to the internal evolution process with the 
introduction of m -|- mp directed links. 

Asymptotically, A^^""' -> {2 + m + mp)tPl''^* , Nk^^ tP{kin) and A^;„„j tP{lout)- Considering 

E hnNk,„ = = {2 + m + mp)t and J2x,yMr,Ay) ^°"t*» * ^ ^o"* ^ ^ kf^ ^ [{2 + m + mp)tf , 

Eq. ([2^ gives a recursive relation 

[2 -f- m + mp + (1 + m + mp){k,n + lout)]Piz' = (l+™+™p)[(fc»n-l)^'fc°:li+ {lout-l)Plz'^^\ 

+ TT, — \ (Ut-l)P(Ut-l)4,„,i 

Z + m + mp 



+ ,0 1,N2 ^(^out - 1) (fcm - l)Pilout - l)P{km - 1) 



m + mp 
{2+m+mpy 

I outkin P {I out ) P {kin )] . 

(23) 

Solving Eq. ([23| directly for i^"^* is difficult, however, it is observed that decomposing Pff^* into 

P{lout)hnP{hn), (24) 

with P{lout) given by Eq. ([T6l) and P(fcm) given by Eq. ([T3| satisfies Eq. ([23]) in the scaling regime, as one 
can readily show by substituting Eq. (I24p into Eq. (I23p and taking the limits of lout ^00 and A:™— >oo. 
Eq. (j24l) implies that there is no degree correlation, a feature that is supported by the empirical results in 
Figure [7] for ESN over a wide range of degrees (also see Figures S16-S18 of Appendix SI). It also follows 
from = and Eq. dM]) that P^°^* = PI^J^ = P^^ = PtZt ■ Interpreting P^Z' as a joint probability, 
the lack of degree correlation as expressed in Eq. ([M)) implies that the conditional probability 

P{kin\lout) ^ kinP(kin)i (25) 

which is independent of lout- For a node i with large fc^^^ ~ k'mit^ the average nearest neighbor function 
can be calculated as 

k^n{kout) = kinP{kin\kout) ^ ^ inP{kin), (26) 

which is also independent of fcout- This is consistent with the behavior of k™^(kout) in ESN, as shown in 
Figure [T] 
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The number of FB loops can be formally written as [5T] 

npB = J2 mn\kout)P{K^t\hr^)P^-\ (27) 

where is the probability that a link connects a node with outdegree k'^^^ to a node with indegree 

The lack of degree correlations makes the summations independent of kin and ko^t, and thus ripB 
scales as 

npB ~ fcm^- (28) 

Similarly, the numbers of four closed triples tia at a node with large indegree and outdegree follow the 
scaling behavior ua ~ kf^ or ua k^^^. Combining ha ~ with Eq. {Pihn) ^ ^7n")' t^e 
distributions of four closed triples have the same scaling behavior as follows: 

P{nA)-nl^, (29) 

where the exponent 7a can be readily found by using P{nA)dnA = P{kin)dkin to be 

^^ = ^+2(l + m + mp)- 
The exponent 7a is determined by the parameters m and p and it falls into the range (1.5, 2]. 

Simulation Results 

We also carried out numerical simulations to study the structural properties of the model and compared 
results with data of real online social networks. The activity m and reciprocation probability p are two 
important parameters of the model. They determine the reciprocity r = (l + mp)/(l + m) and mean 
indegree (fc) = 2 + m + mp of simulated networks. In order to compare results with real online social 
networks, we take three parameters from real data, namely the number of nodes the reciprocity r and 
the mean indegree (outdegree) (fc), and determine the parameter m and p in the model through 

m — 1; 

1 + r 

(k)r — r — l 

Taking ESN as an example, we have (k) « 6.7, r w 0.25, and N = 75879. The model parameters 
are then fixed at m w 4.34 and p ~ 0.08 according to Ea. (l5H) . With the values of m and p, a network 
of = 75879 nodes is simulated. For a non-integer value of m, it is implemented in a probabilistic 
way. For ESN with m = 4.34, for example, the initiation of the fifth new directed link through the 
internal evolution process is implemented with a probability 0.34 after establishing four new directed 
links in every time step. The structural properties of the simulated network are analyzed for each of the 
quantities studied for the real data. Results are shown in Figures [2][9] as red circles for comparison (also 
see Figures S1-S24 of Appendix SI). The model basically reproduces the key properties of ESN. 

For the indegree and outdegree distributions (see Figure [2]) and the reciprocal degree distribution (see 
FigureS]), the simulation results also show similar scaling law, with the exponents ~ 1.95, jout ~ 1.96 
and 7r ~ 2.1 determined by the maximum likelihood estimation |541I55| (see Table SI of Appendix 
SI for more detail). These values are slightly larger than the corresponding values of the exponents 
in ESN. According to Eqs. ([T7| . (PT|) and (ISTI) . these exponents are equal and the theoretical value is 
7 = 2 + l/((fc) — 1) w 2.17. Note that the rate equation analysis assumes an infinite system. The difference 



10 



between the simulated results and the theoretical value comes from the finite size of simulated network, 
as well as the approximations made in getting at the values of the exponent. The indegree and outdegree 
distributions of simulated network are in reasonable agreement with the empirical results of ESN. The 
model, however, gives a reciprocal degree distribution smaller than the ESN empirical results over a wide 
range of fc^. This discrepancy implies that there are some network growing mechanisms in ESN that 
are not included in the model, e.g., different reciprocation probabilities for different nodes |62j . This, 
together with a possibly very weak degree correlation in Figure [7] that we ignored, may be the reason for 
the simulation results in Figures [3] and [5] to be bigger than the empirical values for large in/outdegrees, 
and for the small differences in the tails in Figures and H] ^63, 64 . 

For the distributions of the four closed triples, the distributions from simulations follow a power-law 
behavior with almost the same exponent (see Figure [S]), where ^fb ~ 1.47, ^ff^ ~ 1-46, ^ff,, ~ 1-46 
and 7_FFc ~ 1-46 as determined by the maximum likelihood estimation. These values are slightly larger 
than the exponents found in ESN. Theoretically, 7a 3/2 + l/[2((fc) - 1)] w 1.58 according to Eqs. (1501) 
and ([5T|) . We note that the theoretical values of both 7 and 7a depend only on the mean indegree (fc) , 
which in turn is determined by the two model parameters m and p. Figure [10] shows the values of all 
the 7-exponents of the distributions for the four online social networks and the corresponding simulated 
networks, which are determined by the maximum likelihood estimation. 

The two parameters m and p affect the reciprocal degree of nodes kr through Eg. (|20|) . Substituting 
Eq. §^ into Eq. we have kr - (2(fc}r - r - l)fc,„/[(l + - 1)] « 0.3fc„ for ESN. The reciprocal 

degree kr of a node and its km are related by a factor depending on the two global parameters (k) and r. 
This linear relationship between kr and kin {kout) with a slope 0.3 is observed in simulation results, as 
shown in Figure [SJ but the ESN data show a faster increase of kr with kin and fco^t . When the network 
has a larger reciprocity, such as r w 0.73 for Slashdot, r « 0.45 for Flicker, and r k, 0.65 for YouTuhe, 
a better agreement is observed (see Figures S10-S12 of Appendix SI). Despite some small differences in 
the tail in Figures [S] and [SI which may be caused by local proximity bias in link creation [5] , simulation 
results for the dependence of the number of closed triples with kin and kout are basically in accordance 
with empirical results. 

More comparison of results between the model and large-scale online social networks are given in 
Appendix SI (see Figures S1-S24). The results further support the notions that the two mechanisms 
incorporated in our model provide a potential explanation of the local and mesoscale structures in these 
online social networks. 

Discussion 

With the advancement in information technology, online social systems become an increasingly important 
part of modern life. It is, therefore, of great significance to study the structures and dynamics of these 
systems. In this study, we focused on the local scale, mesoscale and macroscale structural properties 
of online social networks, especially the influence of properties on the local scale and macroscale on the 
mesoscale structures. We analyzed the data and extracted the local scale and macroscale structural 
properties of four large-scale online social networks. It was found that the indegree and outdegree 
distributions follow a similar scaling law, which follows from the fact that kin ~ kout for most of the 
nodes. It implies that there is a preferential attachment mechanism in which the product k^^^kf^ is 
important in the establishment of links during the evolution of online social networks. In addition, the 
very large reciprocity r observed in these networks suggests the existence of a reciprocation mechanism 
in online social networks. The reciprocal degree distribution also shows a similar exponent as that of 
the indegree distribution due to the roughly linear relationship between the reciprocal degree kr and the 
indegree kin of nodes {i.e., kr ~ hn), which in turn implies a fixed probability of reciprocal links between 
connected nodes. In the mesoscale, the close-knit friendship structures are determined by both local 
scale {i.e., indegree and outdegree kin ~ kout) and macroscale {i.e., mean in/outdegree (k)) structural 
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properties. For a node with large kin ~ fcout, the numbers of the four closed triples show the same 
scaling behavior: npB ^ kf^ and npF ^ as a result of the negligible degree correlations in these 
networks. For all nodes, the distributions of these closed triples also follow a similar scaling law. Despite 
the numbers of the three feedforward loops are equal, their distributions look somewhat different in detail. 

To reproduce the empirical features, we proposed and studied a simple directed network model incor- 
porating an external reciprocation process and an internal evolution process. The two parameters in the 
model are the activity m and the reciprocation probability p. They can be inferred from the reciprocity 
r and mean indegree (fc) of real online social networks according to Eq. pi|) . so as to ensure that the 
simulated network and the real network have the same reciprocity and mean indegree. Analytically, we 
derived the structural properties in the local-scale and mesoscale. The results show that the exponents 
characterizing the distributions of indegree, outdegree, reciprocal degree and four closed triples depend 
only on the mean indegree (k), i.e., 7 = 2-f l/((fc} — 1) and 7a =3/2-|-l/[2((fc) — 1)]. In addition, the mean 
indegree (k) and the reciprocity r together determine the ratio of the reciprocal degree to the directed 
in/outdegree, i.e., kr^{2{k)r — r — l)kin/[i{k) — l){l + r)]. The expected indegree and outdegree of nodes 
in the model grow as the same function of the time that the nodes are introduced, with very old nodes 
having very high indegrees and outdegrees. This phenomenon, coupled with an essentially fixed rate of 
reciprocation, reproduces almost all the properties of the online social networks studied here. 

The mesoscale structural properties reported in our work help us understand the interplay between 
structural properties on different scales in online social networks. More specifically, the mesoscale struc- 
tures in these online social networks are determined by global parameters as well as by local distributions. 
This provides a useful perspective of future studies in social network analysis. Our work also provides 
a better understanding of the evolution of online social networks, especially the emergence of close-knit 
friendship structures with a scaling behavior in their distributions. The two processes (reciprocation and 
preferential attachment) provide a possible explanation of the mechanisms underlying the local scale and 
mesoscale structural properties of online social networks. The former reflects that users often respond 
to a new incoming link by quickly establishing a reversed link. The latter means that a well-known user 
with a large km is more likely to attract new connections and an active user with a large kout is more 
likely to create new connections. Our model may also be applied to other growing directed networks 
in which the indegree and outdgree distributions show a similar scaling behavior and the reciprocation 
mechanism is valid. However, the model is not applicable to the symmetric online social networks that 
lack the power-law degree distributions [lH3] {e.g., Facebook), and to the WWW [33] and Wikipedia [56] 
as the indegree and outdegree distributions in these systems carry different exponents and the reciproca- 
tion mechanism is absent. Similarly, it does not apply to the citation network as a paper can only cite 
published papers, but not vice versa. 

Although simulated results of our model basically reproduced the structural properties of the online 
social networks at different scales, the differences in the exponents characterizing the distributions and in 
the tails of the distributions in real online social networks {e.g., Figures S] [S] [H H]) imply that there exist 
other factors, such as individual users of different reciprocation probabilities and local proximity bias, 
that are ignored in the model. These factors are good ingredients for future work. It is also important to 
study the emergence of communities in online social networks. The present work also forms the basis for 
the understanding of the impact of mesoscale structural properties on dynamical processes on online social 
networks, such as information diffusion, opinions formation, and cooperation evolution. An interesting 
problem for future work is to investigate whether the model can be applied to offline real social networks. 
Such a work would help reveal the difference between online and offline social networks. 

Supporting information 

Appendix SI Appendix to the manuscript. 
(PDF) 
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Table SI The exponents of various distributions obtained by power-law fits of real online 
social networks and the simulated network bcised on the model using the meLximum likeli- 
hood estimation. Xmin is the lower bound of the range for fitting a power-law distribution, 7 is the 
corresponding exponent and KS is the goodness-of-fit value based on the Kolmogorov-Smirnov statistic. 
(PDF) 

Table S2 Pearson correlation coefficient. r{in,in) quantifies the tendency of nodes with a high 
indegree to be connected to another node with a high indegree. The other quantities carry a similar 
interpretation. 

(PDF) 

Figure SI Indegree (a) and outdegree (b) distributions of the Slashdot social network (black 
squares) and simulation results (red circles) based on the model. The dashed lines in both 
panels have a slope —2.1 as the analytic results in Eqs. (17) and (31) suggested. The simulated network 
is generated by the model with the parameters N — 82168, m w 5.14 and p « 0.67 as determined by 
the mean degree (fc) and reciprocity of the Slashdot social network. Data points are averages over the 
logarithmic bins of the indegree kin and outdegree kout, respectively. 
(PDF) 

Figure S2 Indegree (a) and outdegree (b) distributions of the Flickr social network (black 
squares) and simulation results (red circles) based on the model. The dashed lines in both 

panels have a slope —2.08 as the analytic results in Eqs. (17) and (31) suggested. The simulated network 
is generated by the model with the parameters N = 100000, m « 8.07 and p « 0.39 as determined 
by the mean degree (k) and reciprocity of the Flickr social network. Data points are averages over the 
logarithmic bins of the indegree and outdegree kout, respectively. 
(PDF) 

Figure S3 Indegree (a) and outdegree (b) distributions of the YouTube social network 

(black squares) and simulation results (red circles) based on the model. The dashed lines in 
both panels have a slope —2.3 as the analytic results in Eqs. (17) and (31) suggested. The simulated 
network is generated by the model with the parameters N = 100000, m » 4.34 and p w 0.08 as determined 

by the mean degree (fc) and reciprocity of the YouTube social network. Data points are averages over 

the logarithmic bins of the indegree /cj„ and outdegree kout, respectively. 

(PDF) 

Figure S4 Relationship between the indegree and the outdegree of nodes in the Slashdot 
social network and the model. Results of the Slashdot social network (black squares) and simulation 
results (red circles) based on the model are shown. The blue dash line represents the relation function 
kin = kout- Data points are averages over the logarithmic bins of the indegree kin- 
(PDF) 

Figure S5 Relationship between the indegree and the outdegree of nodes in the Flickr social 

network and the model. Results of the Flickr social network (black squares) and simulation results 
(red circles) based on the model are shown. The blue dash line represents the relation function kin = kout- 
Data points are averages over the logarithmic bins of the indegree fcj„ . 
(PDF) 

Figure S6 Relationship between the indegree and the outdegree of nodes in the YouTube 
social network and the model. Results of the YouTube social network (black squares) and simulation 

results (red circles) based on the model are shown. The blue dash line represents the relation function 

kin = kout- Data points are averages over the logarithmic bins of the indegree kin- 

(PDF) 

Figure S7 Reciprocal degree distributions of the Slashdot social network and the model. 

Results of the Slashdot social network (black squares) and simulation results (red circles) based on the 
model are shown. Analytic treatment (see Eqs. (17) and (31)) suggests a scaling behavior with an 
exponent —2.1, as shown by the dash line. Data points are averages over the logarithmic bins of the 
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reciprocal degree kr- 
(PDF) 

Figure S8 Reciprocal degree distributions of the Flickr social network and the model. Results 

of the Flickr social network (black squares) and simulation results (red circles) based on the model are 
shown. Analytic treatment (see Eqs. (17) and (31)) suggests a scaling behavior with an exponent —2.08, 
as shown by the dash line. Data points are averages over the logarithmic bins of the reciprocal degree kr. 
(PDF) 

Figure S9 Reciprocal degree distributions of the YouTube social network and the model. 

Results of the YouTube social network (black squares) and simulation results (red circles) based on the 
model are shown. Analytic treatment (sec Eqs. (17) and (31)) suggests a scaling behavior with an 
exponent —2.3, as shown by the dash line. Data points are averages over the logarithmic bins of the 
reciprocal degree kr- 

(PDF) 

Figure SIO Mean reciprocal degree of nodes with (a) the same indegree and (b) the same 
outdegree in the Slashdot social network and in the model. Results of the Slashdot social network 
(black squares) and simulation results (red circles) based on the model are shown in a log-log scale in 
the main panels. Analytic treatment suggests that (kr) is linearly dependent on kin and kout, and the 
blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a linear scale 
and the dash line has a slope of 0.82, as given by Eqs. (20) and (31). Data points are averages over the 
logarithmic bins of the indegree kin and outdegree kout ■, respectively. 
(PDF) 

Figure Sll Mean reciprocal degree of nodes with (a) the same indegree and (b) the Scime 
outdegree in the Flickr social network and in the model. Results of the Flickr social network 
(black squares) and simulation results (red circles) based on the model are shown in a log-log scale in 

the main panels. Analytic treatment suggests that (kr) is linearly dependent on kin and kout, and the 
blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a linear scale 
and the dash line has a slope of 0.59, as given by Eqs. (20) and (31). Data points are averages over the 
logarithmic bins of the indegree kin and outdegree kout, respectively. 
(PDF) 

Figure S12 Mean reciprocal degree of nodes with (a) the same indegree and (b) the same 

outdegree in the YouTube social network and in the model. Results of the YouTube social 
network (black squares) and simulation results (red circles) based on the model are shown in a log-log 
scale in the main panels. Analytic treatment suggests that (kr) is linearly dependent on kin and kout, and 
the blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a linear 
scale and the dash line has a slope of 0.73, as given by Eqs. (20) and (31). Data points are averages over 
the logarithmic bins of the indegree kin and outdegree kout, respectively. 
(PDF) 

Figure S13 Distributions of four basic closed triples in the slashdot social network and the 

model. Distributions of closed triples corresponding to (a) FB, (b) FFa, (c) FFf,, and (d) FFc loops 

in the Slashdot social network (black squares) and in the simulated network based on the model (red 
circles). Analytic treatment (see Eqs. (30) and (31)) suggests a scaling behavior with an exponent —1.55, 
as shown by the dash lines. Data points are averages over the logarithmic bins of the npB, nppa, npFb 
and fipFc, respectively. 
(PDF) 

Figure S14 Distributions of four basic closed triples in the Flickr social network and the 
model. Distributions of closed triples corresponding to (a) FB, (b) FFa, (c) FF^, and (d) FFc loops in 
the Flickr social network (black squares) and in the simulated network based on the model (red circles). 
Analytic treatment (see Eqs. (30) and (31)) suggests a scaling behavior with an exponent —1.54, as 
shown by the dash lines. Data points are averages over the logarithmic bins of the npB, nppa, npFb and 
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npFci respectively. 
(PDF) 

Figure S15 Distributions of four basic closed triples in the YouTube social network and the 
model. Distributions of closed triples corresponding to (a) FB, (b) FFa^ (c) FF^, and (d) FFc loops 
in the YouTube social network (black squares) and in the simulated network based on the model (red 
circles). Analytic treatment (see Eqs. (30) and (31)) suggests a scaling behavior with an exponent —1.65, 
as shown by the dash lines. Data points are averages over the logarithmic bins of the npB, nppa, npFb 
and npFc, respectively. 
(PDF) 

Figure S16 Degree correlations in the Slashdot social network and the model. Results of 

degree correlations as measured by four quantities corresponding to the average nearest neighbor degree 

< fc™(fcj„) > (squares), < fc""t(A:i„) > (circles), < k""^{kout) > (triangles), and < kf^{kout) > (inverted 
triangles) for (a) Slashdot social network and (b) simulated network based on the model. Data points 
are averages over the logarithmic bins of the indegree kin or outdegree kout- 

(PDF) 

Figure S17 Degree correlations in the Flickr social network and the model. Results of degree 
correlations as measured by four quantities corresponding to the average nearest neighbor degree < 
kTnikin) > (squares), < fc;'^t(fc.,„) > (circles), < fc;'^t(fcoj,<) > (triangles), and < k™^{kout) > (inverted 
triangles) for (a) Flickr social network and (b) simulated network based on the model. Data points are 
averages over the logarithmic bins of the indegree kin or outdegree kout- 
(PDF) 

Figure S18 Degree correlations in the YouTube social netowrk and the model. Results of 
degree correlations as measured by four quantities corresponding to the average nearest neighbor degree 

< kfnihn) > (squares), < fc""t(fci„) > (circles), < k'^^iikoud > (triangles), and < fc"„"(A;o«t) > (inverted 
triangles) for (a) YouTube social network and (b) simulated network based on the model. Data points 
are averages over the logarithmic bins of the indegree kin or outdegree kout- 

(PDF) 

Figure S19 Mean number of the four closed triples for nodes with the same indegree in the 
Slashdot social network and the model. Results for the mean number of closed triples corresponding 
to (a) FB, (b) FFa, (c) FFf,, and (d) FFc loops for nodes with the same indegree are shown for the 

Slashdot social network (black squares) and simulated network (red circles) based on the model. Analytic 
treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by the dash line. Data 
points are averages over the logarithmic bins of the indegree kin - 

(PDF) 

Figure S20 Mean number of the four closed triples for nodes with the same indegree in the 
Flickr social network and the model. Results for the mean number of closed triples corresponding 

to (a) FB, (b) FFa, (c) FFf,, and (d) FFc loops for nodes with the same indegree are shown for the 
Flickr social network (black squares) and simulated network (red circles) based on the model. Analytic 
treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by the dash line. Data 
points are averages over the logarithmic bins of the indegree kin- 
(PDF) 

Figure S21 Mean number of the four closed triples for nodes with the same indegree in the 
YouTube social network and the model. Results for the mean number of closed triples corresponding 
to (a) FB, (b) FFa, (c) FF^, and (d) FFc loops for nodes with the same indegree are shown for the 
YouTube social network (black squares) and simulated network (red circles) based on the model. Analytic 

treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by the dash line. Data 

points are averages over the logarithmic bins of the indegree kin- 

(PDF) 

Figure S22 Mean number of the four closed triples for nodes with the same outdegree in the 
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Slashdot social network and the model. Results for the mean number of closed triples corresponding 
to (a) FB, (b) FFa, (c) FFi,, and (d) FFc loops for nodes with the same outdegree are shown for the 
Slashdot social network (black squares) and simulated network (red circles) based on the model. Analytic 
treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by the dash line. Data 
points are averages over the logarithmic bins of the outdegree kout- 
(PDF) 

Figure S23 Mean number of the four closed triples for nodes with the same outdegree in the 
Flickr social network and the model. Results for the mean number of closed triples corresponding 
to (a) FB, (b) FFa, (c) FFt, and (d) FFc loops for nodes with the same outdegree are shown for the 
Flickr social network (black sqiiarcs) and simulated network (red circles) based on the model. Analytic 
treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by the dash line. Data 
points are averages over the logarithmic bins of the outdegree kout- 
(PDF) 

Figure S24 Mean number of the four closed triples for nodes with the same outdegree 
in the YouTube social network and the model. Results for the mean number of closed triples 

corresponding to (a) FB, (b) FF,,. (c) FFi,, and (d) FFc loops for nodes with the same outdegree arc 
shown for the YouTube social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by 
the dash line. Data points are averages over the logarithmic bins of the outdegree kout- 
(PDF) 
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Table 1. Basic statistics of the four online social network datasets. Properties of each 
network: number of users A^, number of directed links E, reciprocity r, number of feedback 
{FB) loop NpBi number of feedforward loops Npp^. The numbers of the three feedforward 
loops {FFa, FFi,, FFc) are equal, because every FFa loop from the perspective of the focal 
node constitutes a FFjj loop and a FFc loop from the perspective of the another two nodes. 



Data sets 


Epinions 


Slashdot 


Flickr 


YouTube 


N 


75,879 


82,168 


1,715,255 


1,138,499 


E 


508,825 


870,161 


22,613,980 


4,945,382 


Umax 
"'in 


3035 


2552 


16255 


25519 


l.max 
'^out 


1801 


2510 


26185 


28644 


r 


0.25 


0.73 


0.45 


0.65 


NpB 


740,310 


899,316 


435,829,822 


5,320,127 


Nff^ 


3,586,403 


2,881,727 


1,667,179,686 


16,287,794 



Figure Legends 

Figure [H Three possible unclosed triples and four basic closed triples for a focal node (red). 
Figure [2l Indegree (a) and outdegree (b) distributions of the Epinions social network (black 
squares) and simulation results (red circles) based on the model. 

Figure \S[ Relationship between the indegree and the outdegree of nodes in the Epinions 
social network and the model. 

Figure |4l Reciprocal degree distributions of the Epinions social network and the model. 
Figure [5l Mean reciprocal degree of nodes with (a) the same indegree and (b) the same 
outdegree in the Epinions social network and in the model. 

Figure \E[ Distributions of four basic closed triples in the Epinions social network and the 
model. 

Figure [3 Degree correlations in the Epinions social network and the model. 

Figure [8l Mean number of the four closed triples for nodes with the same indegree in the 

Epinions social network and the model. 

Figure [9l Mean number of the four closed triples for nodes with the same outdegree in the 
Epinions social network and the model. 

Figure IIOL Values of the 7-exponents for various distributions. 
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Figure 1. Three possible unclosed triples and four basic closed triples for a focal node 
(red). The basic closed triples correspond to one feedback {FB) loop and three feedforward (FF) 
loops. The three feedforward loops differ in the indegree kin of the focal node: kin = for FFa, kin = 1 
for FFi, and kin = 2 for FFc- The numbers of the three feedforward loops are equal because every FFa 
loop from the perspective of the focal node constitutes a FFi, loop and a FFc loop from the perspective 
of the another two nodes, but the loops may arise from different growth histories. 
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Figure 2. Indegree (a) and outdegree (b) distributions of the Epinions social network 
(black squares) and simulation results (red circles) based on the model. The dashed hnes in 
both panels have a slope —2.17 as the analytic results in Eqs. p7)) and (|31l) suggested. The simulated 
network is generated by the model with the parameters N = 75879, m 4.34 and p « 0.08, as 
determined by the mean degree (fc) and reciprocity r of the Epinions social network. Data points are 
averages over the logarithmic bins of the indegree kin and outdegree fco„t, respectively. 




10° 10' ]^ 10 



Figure 3. Relationship between the indegree and the outdegree of nodes in the Epinions 
social network and the model. Results of the Epinions social network (black squares) and 
simulation results (red circles) based on the model are shown. The blue dash line represents the relation 
function kin — kout- Data points are averages over the logarithmic bins of the indegree kin- 
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Figure 4. Reciprocal degree distributions of the Epinions social network and the model. 

Results of the Epinions social network (black squares) and simulation results (red circles) based on the 
model are shown. Analytic treatment (see Eqs. (fT7| and (|3T|) ) suggests a scaling behavior with an 
exponent —2.17, as shown by the dash line. Data points are averages over the logarithmic bins of the 
reciprocal degree fc^. 
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Figure 5. Mean reciprocal degree of nodes with (a) the same indegree and (b) the same 
outdegree in the Epinions social network and in the model. Results of the Epinions social 
network (black squares) and simulation results (red circles) based on the model are shown in a log-log 
scale in the main panels. Analytic treatment suggests that (kr) is linearly dependent on kin and kout, 
and the blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a 
linear scale and the dash line has a slope of 0.3, as given by Eqs. (I^Hl) and ([51]) . Data points are 
averages over the logarithmic bins of the indegree kin and outdegree kout, respectively. 
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Figure 6. Distributions of four basic closed triples in the Epinions social network and the 
model. Distributions of closed triples corresponding to (a) FB, (b) FFa, (c) FFb, and (d) FFc loops 
in the Epinions social network (black squares) and in the simulated net-work based on the model (red 
circles). Analytic treatment (see Eqs. ((30)) and (131]) ') suggests a scaling behavior with an exponent 
— 1.58, as shown by the dash lines. Data points are averages over the logarithmic bins of the npB, 
npFa, nppi, and nppc, respectively. 
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Figure 7. Degree correlations in the Epinions social network and the model. Results of 
degree correlations as measured by four quantities corresponding to the average nearest neighbor degree 
< k'^^ikin) > (squares), < fc^"t(fc„) > (circles), < A:;j"t(fco«t) > (triangles), and < fc™(fco„t) > (inverted 
triangles) for (a) the Epinions social network and (b) simulated network based on the model. Data 
points are averages over the logarithmic bins of the indegree km or outdegree kout- 
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Figure 8. Mean number of the four closed triples for nodes with the same indegree in the 
Epinions social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa, (c) FF\y, and (d) FF^ loops for nodes with the same indegree are 
shown for the Epinions social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (j28p ) gives a scaling behavior with an exponent 2, as indicated by 
the dash lines. Data points are averages over the logarithmic bins of the indegree /cj„. 
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Figure 9. Mean number of the four closed triples for nodes with the same outdegree in 
the Epinions social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa, (c) FFt, and (d) FFc loops for nodes with the same outdegree are 
shown for the Epinions social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. ((28)) ) gives a scaling behavior with an exponent 2, as indicated by 
the dash lines. Data points are averages over the logarithmic bins of the outdegree kout- 
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Figure 10. Values of the 7-exponents for various distributions. Values of the 7-exponents for 
the various distributions as determined by the maximum likelihood estimation against l/((fc) — 1) for 
each of the four large-scale online social networks and the corresponding simulated networks based on 
the model. The lines are only guides to the eye. 
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Appendix SI: Emergence of scale-free close-knit friendship 
structure in online social networks 

This Appendix is divided into three sections. In Sec. 1, we give the exponents of power-law fits carried 
out by the maximum likehhood estimation. In Sec. 2, we present the degree correlation by the Pearson 
correlation coefficient. In Sec. 3, statistical properties of Slashdot, Flickr and YouTube are presented. 
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Table SI. The exponents of various distributions obtained by power-law fits of real online 
social networks and the simulated network based on the model using the msLximum 
likelihood estimation. Xmin is the lower bound of the range for fitting a power-law 
distribution, 7 is the corresponding exponent and KS is the goodness-of-fit value bcised on 
the Kolmogorov-Smirnov statistic. 
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0.01 


2.1 


4 


0.025 


P{nFB) 


1.37 


2 


0.034 


1.47 


3 


0.006 


P{nFFa) 


1.39 


2 


0.04 


1.46 


5 


0.009 


P{nFFb) 


1.35 


2 


0.029 


1.46 


2 


0.008 


P{nFFc) 


1.38 


3 


0.038 


1.46 


5 


0.009 






Slashdot 




model 




Distribution 


7 




KS 


7 


Xmin 


KS 


P{kin) 


1.67 


2 


0.045 


1.83 


3 


0.004 


P{kout) 


1.64 


2 


0.047 


1.83 


3 


0.003 


P{kr) 


1.68 


2 


0.047 


1.85 


2 


0.004 


P{nFB) 


1.55 


6 


0.027 


1.38 


4 


0.009 


P{nFFa) 


1.53 


6 


0.029 


1.38 


4 


0.015 


PinpFi.) 


1.54 


6 


0.03 


1.38 


4 


0.008 


P{np-i-,^) 


1.5() 


G 


0.034 


1.38 


4 


0.017 






Flickr 






model 




Distribution 


7 




KS 


7 


Xmin 


KS 


P{kin) 


1.71 


2 


0.015 


1.8 


4 


0.004 


P{kout) 


1.72 


5 


0.03 


1.8 


4 


0.005 


P{kr) 


1.78 


5 


0.025 


1.86 


1 


0.006 


PiriFB) 


1.38 


2 


0.029 


1.36 


4 


0.007 


P{nFFa) 


1.38 


5 


0.029 


1.35 


4 


0.01 


PinpFh) 


1.37 


2 


0.029 


1.36 


4 


0.007 


P{r>Ft,:) 


1.39 


4 


0.033 


1.3G 


3 


0.009 






YouTube 




model 




Distribution 


7 




KS 


7 


Xmin 


KS 


P{kin) 


2.05 


3 


0.015 


2.12 


3 


0.013 


P{kout) 


2.08 


6 


0.018 


2.12 


3 


0.015 


P{kr) 


2.06 


6 


0.019 


2.2 


3 


0.006 


P{nFB) 


1.62 


4 


0.031 


1.57 


2 


0.01 


P{nFFa) 


1.62 


4 


0.044 


1.57 


4 


0.01 


P{nFFb) 


1.62 


4 


0.029 


1.57 


4 


0.005 


P{nFFc) 


1.64 


4 


0.042 


1.57 


4 


0.01 



29 



2. Degree correlation measured by Pearson correlation coefficient 



Table S2. Pearson correlation coefficient. r{in,in) quantifies the tendency of nodes with a 
high indegree to be connected to another node with a high indegree. The other quantities 
carry a similar interpretation. 



Datasets 


r{in, in) 


r{in, out) 


r{out, out) 


r(out, in) 


Epinions 


-0.009 


0.073 


-0.016 


-0.053 


Slashdot 


-0.068 


-0.059 


-0.064 


-0.071 


Flickr 


0.06 


0.055 


-0.0025 


-0.001 


YouTube 


-0.03 


-0.032 


-0.036 


-0.035 



3. Statistical properties of Slashdot, Flickr and YouTube social 
networks 




Figure SI. Indegree (a) and outdegree (b) distributions of the Slashdot social network 
(black squares) and simulation results (red circles) based on the model. The dashed hnes in 
both panels have a slope —2.1 as the analytic results in Eqs. (17) and (31) suggested. The simulated 
network is generated by the model with the parameters N = 82168, m « 5.14 and p « 0.67 as 
determined by the mean degree (fc) and reciprocity of the Slashdot social network. Data points are 
averages over the logarithmic bins of the indegree km and outdegree kout, respectively. 
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Figure S2. Indegree (a) and outdegree (b) distributions of the Flickr social network (black 
squares) and simulation results (red circles) based on the model. The dashed Unas in both 
panels have a slope —2.08 as the analytic results in Eqs. (17) and (31) suggested. The simulated 
network is generated by the model with the parameters N ~ 100000, m « 8.07 and p « 0.39 as 
determined by the mean degree (k) and reciprocity of the Flickr social network. Data points are 
averages over the logarithmic bins of the indegree fci„ and outdegree kout, respectively. 
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Figure S3. Indegree (a) and outdegree (b) distributions of the YouTube social network 
(black squares) and simulation results (red circles) based on the model. The dashed lines in 
both panels have a slope —2.3 as the analytic results in Eqs. (17) and (31) suggested. The simulated 
network is generated by the model with the parameters N = 100000, m w 4.34 and p ~ 0.08 as 
determined by the mean degree (k) and reciprocity of the YouTube social network. Data points are 
averages over the logarithmic bins of the indegree fc^^ and outdegree fcout ; respectively. 
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Figure S4. Relationship between the indegree and the outdegree of nodes in the Slashdot 
social network and the model. Results of the Slashdot social network (black squares) and 
simulation results (red circles) based on the model are shown. The blue dash line represents the relation 
function km = kout- Data points are averages over the logarithmic bins of the indegree km- 
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Figure S5. Relationship between the indegree and the outdegree of nodes in the Flickr 
social network and the model. Results of the Flickr social network (black squares) and simulation 
results (red circles) based on the model are shown. The blue dash line represents the relation function 
kin = kout- Data points are averages over the logarithmic bins of the indegree kin- 
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Figure S6. Relationship between the indegree and the outdegree of nodes in the YouTube 
social network and the model. Results of the YouTube social network (black squares) and 
simulation results (red circles) based on the model are shown. The blue dash line represents the relation 
function km — kout- Data points are averages over the logarithmic bins of the indegree km- 
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Figure S7. Reciprocal degree distributions of the Slashdot social network and the model. 

Results of the Slashdot social network (black squares) and simulation results (red circles) based on the 
model are shown. Analytic treatment (see Eqs. (17) and (31)) suggests a scaling behavior with an 
exponent —2.1, as shown by the dash line. Data points are averages over the logarithmic bins of the 
reciprocal degree kr- 
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Figure S8. Reciprocal degree distributions of the Flickr social network and the model. 

Results of the Flickr social network (black squares) and simulation results (red circles) based on the 
model are shown. Analytic treatment (see Eqs. (17) and (31)) suggests a scaling behavior with an 
exponent —2.08, as shown by the dash line. Data points are averages over the logarithmic bins of the 
reciprocal degree kr- 
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Figure S9. Reciprocal degree distributions of the YouTube social network and the model. 

Results of the YouTube social network (black squares) and simulation results (red circles) based on the 
model are shown. Analytic treatment (see Eqs. (17) and (31)) suggests a scaling behavior with an 
exponent —2.3, as shown by the dash line. Data points are averages over the logarithmic bins of the 
reciprocal degree kr- 
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Figure SIO. Mean reciprocal degree of nodes with (a) the same indegree and (b) the same 
outdegree in the Slashdot social network and in the model. Results of the Slashdot social 
network (black squares) and simulation results (red circles) based on the model are shown in a log-log 
scale in the main panels. Analytic treatment suggests that (kr) is linearly dependent on km and kout, 
and the blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a 
linear scale and the dash line has a slope of 0.82, as given by Eqs. (20) and (31). Data points are 
averages over the logarithmic bins of the indeg 
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Figure Sll. Mean reciprocal degree of nodes with (a) the same indegree and (b) the same 
outdegree in the Flickr social network and in the model. Results of the Flickr social network 
(black squares) and simulation results (red circles) based on the model are shown in a log-log scale in 
the main panels. Analytic treatment suggests that (fc^) is linearly dependent on kin and kout, and the 
blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a linear 
scale and the dash line has a slope of 0.59, as given by Eqs. (20) and (31). Data points are averages 
over the logarithmic bins of the indegree kin and outdegree fco„t, respectively. 



37 



10" 
A 10' 

V 

10' 
10° 


E 200 r 

■q ° 

ra , 


□ 

B.-' slope=0.73 


190 .200 . , 


10° 10' 10' ^10' 10' 10' 

in 


10' 
A 10' 

V 

10' 
10° 


Kb) 




^ 

.X 

slope=0.73 


100 200 
1 



10° 10' 10' ,10' 10' 10' 



Figure S12. Mean reciprocal degree of nodes with (a) the same indegree and (b) the same 
outdegree in the YouTube social network and in the model. Results of the YouTube social 
network (black squares) and simulation results (red circles) based on the model are shown in a log-log 
scale in the main panels. Analytic treatment suggests that (fc^) is linearly dependent on kin and kouti 
and the blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a 
linear scale and the dash line has a slope of 0.73, as given by Eqs. (20) and (31). Data points are 
averages over the logarithmic bins of the indegree kin and outdegree kout, respectively. 
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Figure S13. Distributions of four basic closed triples in the slashdot social network and 
the model. Distributions of closed triples corresponding to (a) FB, (b) FFa, (c) FF^, and (d) FFc 
loops in the Slashdot social network (black squares) and in the simulated network based on the model 
(red circles). Analytic treatment (see Eqs. (30) and (31)) suggests a scaling behavior with an exponent 
— 1.55, as shown by the dash lines. Data points are averages over the logarithmic bins of the nps^ 
n-FFa, nFFb and hffc, respectively. 
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Figure S14. Distributions of four basic closed triples in the Flickr social network and the 
model. Distributions of closed triples corresponding to (a) FB, (b) FFq, (c) FFi,, and (d) FFc loops 
in the Flickr social network (black squares) and in the simulated network based on the model (red 
circles). Analytic treatment (see Eqs. (30) and (31)) suggests a scaling behavior with an exponent 
— 1.54, as shown by the dash lines. Data points are averages over the logarithmic bins of the npB, 
npFa, npFb and nppc, respectively. 
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Figure S15. Distributions of four basic closed triples in the YouTube social network and 
the model. Distributions of closed triples corresponding to (a) FB, (b) FFa, (c) FFb, and (d) FFc 
loops in the YouTube social network (black squares) and in the simulated network based on the model 
(red circles). Analytic treatment (see Eqs. (30) and (31)) suggests a scaling behavior with an exponent 
— 1.65, as shown by the dash lines. Data points are averages over the logarithmic bins of the npB, 
npFa, npFb and npFc, respectively. 



40 



10' 
10' 

V 10^ 



10 





10" 







(a) 




« « i> ^ 








: □ <k"{kj> O 


<k°°(kj> V <k2(k^„,)> A <k°;(k^^,)> 







10' 



10^ 



10' 




Figure S16. Degree correlations in the Slashdot social network and the model. Results of 
degree correlations as measured by four quantities corresponding to the average nearest neighbor degree 
< k'^nikin) > (squares), < fc™j(fc„) > (circles), < kl!;^tikout) > (triangles), and < fc™(fco„t) > (inverted 
triangles) for (a) Slashdot social network and (b) simulated network based on the model. Data points 
are averages over the logarithmic bins of the indegree kin or outdegree kout- 
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Figure S17. Degree correlations in the Flickr social network and the model. Results of 
degree correlations as measured by four quantities corresponding to the average nearest neighbor degree 
< k'^nikin) > (squares), < fc""f(fcin) > (circles), < ki;;^tikout) > (triangles), and < k™{kout) > (inverted 
triangles) for (a) Flickr social network and (b) simulated network based on the model. Data points are 
averages over the logarithmic bins of the indegree km or outdegree kout- 
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Figure S18. Degree correlations in the YouTube social netowrk and the model. Results of 
degree correlations as measured by four quantities corresponding to the average nearest neighbor degree 
< A:™(fci„) > (squares), < > (circles), < fc™t(fcout) > (triangles), and < fc™(fco„t) > (inverted 

triangles) for (a) YouTube social network and (b) simulated network based on the model. Data points 
are averages over the logarithmic bins of the indegree kin or outdegree kout ■ 
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Figure S19. Mean number of the four closed triples for nodes with the same indegree in 
the Slashdot social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa, (c) FF^, and (d) FFc loops for nodes with the same indegree are 
shown for the Slashdot social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by 
the dash line. Data points are averages over the logarithmic bins of the indegree kin- 
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Figure S20. Mean number of the four closed triples for nodes with the same indegree in 
the Flickr social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa^ (c) FFt,, and (d) FFc loops for nodes with the same indegree are 
shown for the Flickr social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by 
the dash line. Data points are averages over the logarithmic bins of the indegree kin- 
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Figure S21. Mean number of the four closed triples for nodes with the same indegree in 
the YouTube social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa, (c) FFi,, and (d) FFc loops for nodes with the same indegree are 
shown for the YouTube social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by 
the dash line. Data points are averages over the logarithmic bins of the indegree kin- 
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Figure S22. Mean number of the four closed triples for nodes with the same outdegree in 
the Slashdot social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa, (c) FF},, and (d) FF^ loops for nodes with the same outdegree are 
shown for the Slashdot social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by 
the dash line. Data points are averages over the logarithmic bins of the outdegree kout ■ 
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Figure S23. Mean number of the four closed triples for nodes with the same outdegree in 
the Flickr social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa, (c) FFi,, and (d) FFc loops for nodes with the same outdegree are 
shown for the Flickr social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by 
the dash line. Data points are averages over the logarithmic bins of the outdegree kout- 
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Figure S24. Mean number of the four closed triples for nodes with the same outdegree in 
the YouTube social network and the model. Results for the mean number of closed triples 
corresponding to (a) FB, (b) FFa, (c) FFi,, and (d) FFc loops for nodes with the same outdegree are 
shown for the YouTube social network (black squares) and simulated network (red circles) based on the 
model. Analytic treatment (see Eq. (28)) gives a scaling behavior with an exponent 2, as indicated by 
the dash line. Data points are averages over the logarithmic bins of the outdegree kout ■ 



